category 2
- South America > Brazil (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- (6 more...)
- Information Technology (0.67)
- Law (0.67)
- Government (0.46)
- South America > Brazil (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- (6 more...)
- Information Technology (0.67)
- Law (0.67)
- Government (0.46)
IA-VLA: Input Augmentation for Vision-Language-Action models in settings with semantically complex tasks
Hannus, Eric, Malin, Miika, Le, Tran Nguyen, Kyrki, Ville
Figure 1: Semantically complex language instructions, such as those involving the relative positions of objects, pose a difficult challenge for vision-language-action models (VLAs). To address this problem, we propose IA-VLA, a framework for augmenting the input to VLAs, that offloads the semantic understanding to a larger vision language model (VLM) with greater semantic understanding. We use semantic segmentation to label image regions which a VLM then uses to identify the masks of the task-relevant objects. The task-relevant objects are highlighted in the VLA input, together with the language instruction which can optionally be simplified. Abstract-- Vision-language-action models (VLAs) have become an increasingly popular approach for addressing robot manipulation problems in recent years. However, such models need to output actions at a rate suitable for robot control, which limits the size of the language model they can be based on, and consequently, their language understanding capabilities. Manipulation tasks may require complex language instructions, such as identifying target objects by their relative positions, to specify human intention. Therefore, we introduce IA-VLA, a framework that utilizes the extensive language understanding of a large vision language model as a pre-processing stage to generate improved context to augment the input of a VLA. We evaluate the framework on a set of semantically complex tasks which have been underexplored in VLA literature, namely tasks involving visual duplicates, i.e., visually indistinguishable objects.
- Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
- Europe > Denmark (0.04)
- Europe > Switzerland (0.04)
An AI-Powered Framework for Analyzing Collective Idea Evolution in Deliberative Assemblies
Poole-Dayan, Elinor, Roy, Deb, Kabbara, Jad
In an era of increasing societal fragmentation, political polarization, and erosion of public trust in institutions, representative deliberative assemblies are emerging as a promising democratic forum for developing effective policy outcomes on complex global issues. Despite theoretical attention, there remains limited empirical work that systematically traces how specific ideas evolve, are prioritized, or are discarded during deliberation to form policy recommendations. Addressing these gaps, this work poses two central questions: (1) How might we trace the evolution and distillation of ideas into concrete recommendations within deliberative assemblies? (2) How does the deliberative process shape delegate perspectives and influence voting dynamics over the course of the assembly? To address these questions, we develop LLM-based methodologies for empirically analyzing transcripts from a tech-enhanced in-person deliberative assembly. The framework identifies and visualizes the space of expressed suggestions. We also empirically reconstruct each delegate's evolving perspective throughout the assembly. Our methods contribute novel empirical insights into deliberative processes and demonstrate how LLMs can surface high-resolution dynamics otherwise invisible in traditional assembly outputs.
- Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)
- South America > Argentina (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Education > Educational Setting (0.67)
- Government (0.66)
- Energy > Renewable (0.46)
- Water & Waste Management > Solid Waste Management (0.46)
DS@GT at CheckThat! 2025: Ensemble Methods for Detection of Scientific Discourse on Social Media
Parikh, Ayush, Truong, Hoang Thanh Thanh, Schofield, Jeanette, Heil, Maximilian
In this paper, we, as the DS@GT team for CLEF 2025 CheckThat! Task 4a Scientific Web Discourse Detection, present the methods we explored for this task. For this multiclass classification task, we determined if a tweet contained a scientific claim, a reference to a scientific study or publication, and/or mentions of scientific entities, such as a university or a scientist. We present 3 modeling approaches for this task: transformer finetuning, few-shot prompting of LLMs, and a combined ensemble model whose design was informed by earlier experiments. Our team placed 7th in the competition, achieving a macro-averaged F1 score of 0.8611, an improvement over the DeBERTaV3 baseline of 0.8375. Our code is available on Github at https://github.com/dsgt-arc/checkthat-2025-swd/tree/main/subtask-4a.
- North America > United States > Georgia > Fulton County > Atlanta (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations
Yang, Zhantao, Feng, Ruili, Yan, Keyu, Wang, Huangji, Wang, Zhicai, Zhu, Shangwen, Zhang, Han, Xiao, Jie, Wu, Pingyu, Zhu, Kai, Chen, Jixuan, Xie, Chen-Wei, Mao, Chaojie, Yang, Yue, Zhang, Hongyang, Liu, Yu, Cheng, Fan
This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimum elements and presents them in a graph structure. Element-wise style enables easy understanding, and structural composition liberates difficult locating. Careful prompt design births the BACON captions with the help of public-available VLMs and segmentation methods. In this way, we gather a dataset with 100K annotated images, which endow VLMs with remarkable capabilities, such as accurately generating BACON, transforming prompts into BACON format, envisioning scenarios in the style of BACONr, and dynamically modifying elements within BACON through interactive dialogue and more. Wide representative experiments, including detection, VQA, and image generation tasks, tell BACON as a lifeline to achieve previous out-of-reach tasks or excel in their current cutting-edge solutions.
- Transportation > Passenger (0.93)
- Transportation > Ground > Road (0.68)
- Energy > Oil & Gas (0.68)
Unit Test Generation using Generative AI : A Comparative Performance Analysis of Autogeneration Tools
Bhatia, Shreya, Gandhi, Tarushi, Kumar, Dhruv, Jalote, Pankaj
Generating unit tests is a crucial task in software development, demanding substantial time and effort from programmers. The advent of Large Language Models (LLMs) introduces a novel avenue for unit test script generation. This research aims to experimentally investigate the effectiveness of LLMs, specifically exemplified by ChatGPT, for generating unit test scripts for Python programs, and how the generated test cases compare with those generated by an existing unit test generator (Pynguin). For experiments, we consider three types of code units: 1) Procedural scripts, 2) Function-based modular code, and 3) Class-based code. The generated test cases are evaluated based on criteria such as coverage, correctness, and readability. Our results show that ChatGPT's performance is comparable with Pynguin in terms of coverage. At the same time, ChatGPT's ability to generate tests is superior to Pynguin, as the latter is not able to generate test cases for Category 1. We also find that about 39% and 28% of assertions generated by ChatGPT for Category 2 and 3, respectively, were incorrect. Our results also show that there is minimal overlap in missed statements between ChatGPT and Pynguin, thus, suggesting that a combination of both tools may enhance unit test generation performance. Finally, prompt engineering improved ChatGPT's performance, achieving an average 28% coverage improvement in Category 2 and 15% improvement in Category 3 after about 4 iterations.
- Europe > Portugal > Lisbon > Lisbon (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Asia > India > NCT > Delhi (0.05)
- (5 more...)
UAVs and Birds: Enhancing Short-Range Navigation through Budgerigar Flight Studies
Rahman, Md. Mahmudur, Islam, Sajid, Chowdhury, Showren, Zeba, Sadia Jahan, Karmaker, Debajyoti
This study delves into the flight behaviors of Budgerigars (Melopsittacus undulatus) to gain insights into their flight trajectories and movements. Using 3D reconstruction from stereo video camera recordings, we closely examine the velocity and acceleration patterns during three flight motion takeoff, flying and landing. The findings not only contribute to our understanding of bird behaviors but also hold significant implications for the advancement of algorithms in Unmanned Aerial Vehicles (UAVs). The research aims to bridge the gap between biological principles observed in birds and the application of these insights in developing more efficient and autonomous UAVs. In the context of the increasing use of drones, this study focuses on the biologically inspired principles drawn from bird behaviors, particularly during takeoff, flying and landing flight, to enhance UAV capabilities. The dataset created for this research sheds light on Budgerigars' takeoff, flying, and landing techniques, emphasizing their ability to control speed across different situations and surfaces. The study underscores the potential of incorporating these principles into UAV algorithms, addressing challenges related to short-range navigation, takeoff, flying, and landing.
- North America > United States (0.14)
- Asia > Bangladesh (0.04)
- Aerospace & Defense (0.88)
- Transportation > Air (0.66)
An Investigation of Indian Native Language Phonemic Influences on L2 English Pronunciations
Jain, Shelly, Pal, Priyanshi, Vuppala, Anil, Ghosh, Prasanta, Yarra, Chiranjeevi
Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- Europe > Netherlands (0.04)
- Asia > Southeast Asia (0.04)
- (3 more...)
Machine-Learning Model Improves Gas Lift Performance and Well Integrity
The main objective of this work is to use machine-learning (ML) algorithms to develop a powerful model to predict well-integrity (WI) risk categories of gas-lifted wells. The model described in the complete paper can predict well-risk level and provide a unique method to convert associated failure risk of each element in the well envelope into tangible values. The predictive model, which predicts the risk status of wells and classifies their integrity level into five categories rather than three broad-range categories, as in qualitative risk classification. The five categories are Category 1, which is too risky Category 2, which is still too risky but less so than Category 1 Category 3, which is medium risk but can be elevated if additional barrier failures occur Category 4, which is low risk but features some impaired barriers Category 5, which is the lowest in risk The failure model, which identifies whether the well is considered to be in failure mode. In addition, the model can identify wells that require prompt mitigation.